Broadcast Attention Learning for Real Telephone Speech Keyword Spotting
نویسندگان
چکیده
Abstract With the development of mobile smart devices, keyword spotting plays an important role in interaction between machines and users. However, low storage energy consumption devices limit accuracies tasks. Therefore, how to achieve a balance high accuracy is research hotspot for system. Convolutional neural networks have been widely adopted recent systems due their superior accuracies, success transformer architecture many areas demonstrates effectiveness self-attention. In this paper, we combine self-attention convolutional networks, propose broadcast attention learning network (BA-net), using small number parameters while achieving 97.18% 78.44% respectively on Google speech command dataset real telephone dataset.
منابع مشابه
Robust Multi-Keyword Spotting of Telephone Speech Using Stochastic Matching
In telephone speech recognition, the acoustic mismatch between the training and the test environment often causes severe degradation due to the channel distortion and ambient noise. In this paper, a two-level codebook-based stochastic matching (CBSM) is proposed to deal with the acoustic mismatch. For multi-keyword detection, we define a keyword relation table and a weighting function for reaso...
متن کاملSemantic keyword spotting by learning from images and speech
We consider the problem of representing semantic concepts in speech by learning from untranscribed speech paired with images of scenes. This setting is relevant in low-resource speech processing, robotics, and human language acquisition research. We use an external image tagger to generate soft labels, which serve as targets for training a neural model that maps speech to keyword labels. We int...
متن کاملUtterance verification using prosodic information for Mandarin telephone speech keyword spotting
In this paper, the prosodic information, a very special and important feature in Mandarin speech, is used for Mandarin telephone speech utterance verification. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 59 context-independent subsyllables, i.e., 22 INITIAL’s and 37 FINAL’s in Mandarin speech, and one background/silence model, are used a...
متن کاملTelephone speech multi-keyword spotting using fuzzy search algorithm and prosodic verification
In this paper a fuzzy search algorithm is proposed to deal with the recognition error for telephone speech. Since the prosodic information is a very special and important feature for Mandarin speech, we integrate the prosodic information into keyword verification. For multi-keyword detection, we define a keyword relation and a weighting function for reasonable keyword combinations. In the keywo...
متن کاملMulti-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model
In this paper, orthogonal transform-based signal bias removal (OTSBR) approach and RNN prosodic model are proposed for multi-keyword spotting of telephone speech. OTSBR is employed in the pre-processing stage of acoustic decoding and aimed at channel bias estimation to eliminate the acoustic mismatch between training and testing environments. The RNN prosodic model is adopted in the post-proces...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of physics
سال: 2023
ISSN: ['0022-3700', '1747-3721', '0368-3508', '1747-3713']
DOI: https://doi.org/10.1088/1742-6596/2506/1/012003